LW06 correlations testing & demo

04/01/21 PH

Demo version with correlations & covariance imaging (LW06 test dataset) between electron VMI data and ion ToF data. Demoed from pre-processed data, with additional filtering also applied.

For dev work see LW06_correlations_tests_291220.ipynb.

29/12/20 v2

Implemented covariance VMI analysis routines (includes filtering).

10/12/20 v1

Basic correlation coefficient for ion and electron channels (raw data, only gas on/off filtering).

Setup

In [1]:
import numpy as np

from h5py import File

from pathlib import Path

# HV imports
import holoviews as hv
from holoviews import opts
hv.extension('bokeh', 'matplotlib')

import xarray as xr


# Memory profiler - OPTIONAL for testing
# https://timothymonteath.com/articles/monitoring_memory_usage/
# %load_ext memory_profiler
# %memit


# Quick hack for slow internet connection!
%autosave 36000

# Also image stacks can be problematic
showImgStacks = False

# Import class - this requires pointing to the `tmoDataBase.py` file.
# See https://github.com/phockett/tmo-dev

# Import class from file
import sys
sys.path.append(r'/cds/home/p/phockett/dev/tmo-dev/tmo')
import tmoDataBase as tb
# from vmi import VMI  # VMI class - currently has multiple filtering, ish.
# from inversion import VMIproc  # VMI processing class
from correlations import corr  # Correlations class

tb.setPlotDefaults()  # Set plot defaults (optional)

# tb.setPlotDefaults()  # Set plot defaults (optional)
Autosaving every 36000 seconds

Read data

In [2]:
# Set main data path
baseDir = Path('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v5')
keys = range(81,87+1)

# Setup class object and which runs to read.
# data = VMIproc(fileBase = baseDir, runList = range(81,87+1), fileSchema='_v5')
# data = VMIproc(fileBase = baseDir, runList = keys, fileSchema='_v5')  # Testing with single dataset
# data = VMI(fileBase = baseDir, runList = keys, fileSchema='_v5')  # Testing with single dataset
data = corr(fileBase = baseDir, runList = keys, fileSchema='_v5')  # Testing with multiple datasets

# The class has a bunch of dictionaries holding info on the data
# data.runs['files']

# Read in the files with h5py
# There is very basic checking implemented currently, just to see if 'energies' is present.
data.readFiles()
Read 7 files.
Good datasets: [81, 82, 83, 84, 85, 86, 87]
Invalid datasets: []

Mass spectra

For e-i correlations use data in intensities variable, corresponding to ion tof bins in ts variable.

In [147]:
data.data[81]['raw']['intensities'].shape # Ion tof processed data, shots x M/Q bins
Out[147]:
(73658, 5)
In [153]:
np.array(data.data[81]['raw']['ts'])  # Ion tof bins
Out[153]:
array([7.675, 6.365, 4.415, 3.185, 9.5  ])

Ion tof bins in pre-processed data correspond to:

  1. 7.675: last weak feature,
  2. 6.365: penultimate weak feature,
  3. 4.415: main peak
  4. 3.185: first peak
  5. 9.5: no signal

Possibly not the best binning, but will do for testing, and might be good to be blind here with no preconceptions of correlations. Bin 4 should be a good test of correlations, since it should result in a blank image.

Filter

In [3]:
# Simple filter - set for gas on only
# TODO: implement multiple filter sets here.
# data.setFilter(reset = True)
data.setFilter({'gas on':{'evrs':[1,1,70]}}, reset = True)  # , 'gas off':{'evrs':[0,0,70]}})
data.filterData()

Calculate correlation matrices (per run)

In [4]:
# Run corrRun() to calculate covariance matrices for each run
data.corrRun()
Set self.ndoverlay, self.hmap and self.layout for dim=0.
Set self.ndoverlay, self.hmap and self.layout for dim=['intensities', 'eShot'].
In [5]:
data.hmap
WARNING:param.HeatMapPlot04584: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
Out[5]:
In [6]:
data.hmap.layout().cols(2)
WARNING:param.HeatMapPlot05514: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
WARNING:param.HeatMapPlot05539: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
WARNING:param.HeatMapPlot05564: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
WARNING:param.HeatMapPlot05589: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
WARNING:param.HeatMapPlot05614: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
WARNING:param.HeatMapPlot05639: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
WARNING:param.HeatMapPlot05664: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
Out[6]:

Sanity check - for gas off.

In [7]:
# data.setFilter()
data.setFilter({'gas off':{'evrs':[0,0,70]}}, reset = True)  # , 'gas off':{'evrs':[0,0,70]}})
data.filterData()
data.corrRun()
Set self.ndoverlay, self.hmap and self.layout for dim=2.
Set self.ndoverlay, self.hmap and self.layout for dim=['intensities', 'eShot'].
In [8]:
data.hmap
WARNING:param.HeatMapPlot06133: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
Out[8]:

Still see some correlations here, although quite different - presumably residual background gas contributions. (Haven't checked carefully as yet, will also be clearer when channels are assigned.)

Covariance VMI

In the current implementation this will calculate covariances for electron (x,y) data with another data variable, e.g. ion intensities, on a per-shot basis. The routine uses numpy.corrcoef (https://numpy.org/devdocs/reference/generated/numpy.corrcoef.html) to calculate correlations for each (x,y) pixel with each other data dimension set (e.g. multiple ion species). It's quite slow, but amenable to massive parallelisation with some effort.

To make the code simpler, the data is converted to a Sparse array first (using the Sparse library), standard 3D array referencing is then possible with a minimal memory footprint. It also gives a nice array summary output.

Generate

In [10]:
# For testing, set filter first...
# Simple filter - set for gas on only
# TODO: implement multiple filter sets here.
data.setFilter(reset = True)
data.setFilter({'unsat':{'evrs':[1,1,70], 'eShot':[0,2000,0], 'desc':'Unsaturated signal, electron counts < 2000.'}})  #'gas on':{'evrs':[1,1,70]}, 'gas off':{'evrs':[0,0,70]}})
# Gas on/off, set event code #70. Unsaturated signal, check electron counts < 2000.
data.filterData()
In [11]:
data.filters
Out[11]:
{'unsat': {'evrs': [1, 1, 70],
  'eShot': [0, 2000, 0],
  'desc': 'Unsaturated signal, electron counts < 2000.'}}
In [12]:
# Generate covariance VMI images.
# This takes a while (~30 mins for test dataset) with current implementation, which is looped over pixel and correlated datasets (default is 'intensities' data), and NOT YET PARALLELISED!
# Note that the default includes x4 downsampling on the (x,y) pixels, or this can be passed as a parameter as here.
# At x2 downsampling, this takes ~1 hour per run.
data.genCorrVMIXmulti(saveSparse=True, downsampleRatios = [2,2,1])
Generating covariance VMI images for filters: unsat
Generating sparse data for dataset 81
Set sparse array, dims = ['xc', 'yc', 'shot']
Formatcoo
Data Typeint64
Shape(1020, 1020, 73658)
nnz17076586
Density0.000222833654909523
Read-onlyTrue
Size521.1M
Storage ratio0.0
Generating covariance VMI images for dataset 81
Generating covariance VMI images for covar data intensities, col=0 of 5
/cds/home/p/phockett/.conda/envs/ps-4.0.8-local-clone/lib/python3.7/site-packages/numpy/lib/function_base.py:2559: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[:, None]
/cds/home/p/phockett/.conda/envs/ps-4.0.8-local-clone/lib/python3.7/site-packages/numpy/lib/function_base.py:2560: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[None, :]
Generating covariance VMI images for covar data intensities, col=1 of 5
Generating covariance VMI images for covar data intensities, col=2 of 5
Generating covariance VMI images for covar data intensities, col=3 of 5
Generating covariance VMI images for covar data intensities, col=4 of 5
Generating sparse data for dataset 82
Set sparse array, dims = ['xc', 'yc', 'shot']
Formatcoo
Data Typeint64
Shape(1020, 1020, 75402)
nnz22457412
Density0.000286270434760342
Read-onlyTrue
Size685.3M
Storage ratio0.0
Generating covariance VMI images for dataset 82
Generating covariance VMI images for covar data intensities, col=0 of 5
Generating covariance VMI images for covar data intensities, col=1 of 5
Generating covariance VMI images for covar data intensities, col=2 of 5
Generating covariance VMI images for covar data intensities, col=3 of 5
Generating covariance VMI images for covar data intensities, col=4 of 5
Generating sparse data for dataset 83
Set sparse array, dims = ['xc', 'yc', 'shot']
Formatcoo
Data Typeint64
Shape(1020, 1020, 72980)
nnz22947146
Density0.0003022208872802153
Read-onlyTrue
Size700.3M
Storage ratio0.0
Generating covariance VMI images for dataset 83
Generating covariance VMI images for covar data intensities, col=0 of 5
Generating covariance VMI images for covar data intensities, col=1 of 5
Generating covariance VMI images for covar data intensities, col=2 of 5
Generating covariance VMI images for covar data intensities, col=3 of 5
Generating covariance VMI images for covar data intensities, col=4 of 5
Generating sparse data for dataset 84
Set sparse array, dims = ['xc', 'yc', 'shot']
Formatcoo
Data Typeint64
Shape(1020, 1020, 74351)
nnz27728206
Density0.00035845497662352616
Read-onlyTrue
Size846.2M
Storage ratio0.0
Generating covariance VMI images for dataset 84
Generating covariance VMI images for covar data intensities, col=0 of 5
Generating covariance VMI images for covar data intensities, col=1 of 5
Generating covariance VMI images for covar data intensities, col=2 of 5
Generating covariance VMI images for covar data intensities, col=3 of 5
Generating covariance VMI images for covar data intensities, col=4 of 5
Generating sparse data for dataset 85
Set sparse array, dims = ['xc', 'yc', 'shot']
Formatcoo
Data Typeint64
Shape(1020, 1020, 73406)
nnz33486672
Density0.0004384702028983485
Read-onlyTrue
Size1021.9M
Storage ratio0.0
Generating covariance VMI images for dataset 85
Generating covariance VMI images for covar data intensities, col=0 of 5
Generating covariance VMI images for covar data intensities, col=1 of 5
Generating covariance VMI images for covar data intensities, col=2 of 5
Generating covariance VMI images for covar data intensities, col=3 of 5
Generating covariance VMI images for covar data intensities, col=4 of 5
Generating sparse data for dataset 86
Set sparse array, dims = ['xc', 'yc', 'shot']
Formatcoo
Data Typeint64
Shape(1020, 1020, 92623)
nnz47847925
Density0.000496528203113876
Read-onlyTrue
Size1.4G
Storage ratio0.0
Generating covariance VMI images for dataset 86
Generating covariance VMI images for covar data intensities, col=0 of 5
Generating covariance VMI images for covar data intensities, col=1 of 5
Generating covariance VMI images for covar data intensities, col=2 of 5
Generating covariance VMI images for covar data intensities, col=3 of 5
Generating covariance VMI images for covar data intensities, col=4 of 5
Generating sparse data for dataset 87
Set sparse array, dims = ['xc', 'yc', 'shot']
Formatcoo
Data Typeint64
Shape(1020, 1020, 79525)
nnz58751882
Density0.0007100971369679715
Read-onlyTrue
Size1.8G
Storage ratio0.0
Generating covariance VMI images for dataset 87
Generating covariance VMI images for covar data intensities, col=0 of 5
Generating covariance VMI images for covar data intensities, col=1 of 5
Generating covariance VMI images for covar data intensities, col=2 of 5
Generating covariance VMI images for covar data intensities, col=3 of 5
Generating covariance VMI images for covar data intensities, col=4 of 5

*** TESTING seems OK, but filters not working so far? Could be super() issue? Or overwriting data accidentally?

Covariance VMI results

These are the (electron) VMI results for each run and ToF feature. Use the sliders to select.

In [13]:
# Show images
data.showImgSet(dims = ['xc','yc'])
WARNING:param.RasterPlot37354: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
In [14]:
# The data is stored in the imgStack parameter.
data.imgStack
Out[14]:
<xarray.Dataset>
Dimensions:      (intensities: 5, run: 7, xc: 510, yc: 510)
Coordinates:
  * run          (run) int64 81 82 83 84 85 86 87
  * xc           (xc) int64 0 1 2 3 4 5 6 7 ... 502 503 504 505 506 507 508 509
  * yc           (yc) int64 0 1 2 3 4 5 6 7 ... 502 503 504 505 506 507 508 509
  * intensities  (intensities) int64 0 1 2 3 4
    norm         (run) int64 63130 64625 62557 63722 62924 79393 68166
Data variables:
    unsat        (run, xc, yc, intensities) float64 nan nan ... -4.906e-08
In [77]:
# Quick save to disk...
# data.corrStack.to_netcdf("LW06_corrStack_test_040121.nc")  # Fails on attribs
data.corrStackCP = data.corrStack.copy()  
data.corrStackCP['unsat'].attrs = {}
data.corrStackCP.to_netcdf("LW06_corrStack_test_040121.nc")  # Runs... 72Mb at 2,2 resample
# ds_disk = xr.open_dataset("LW06_corrStack_test_040121.nc") to load

Versions

In [32]:
# tmo-dev class version
data.__version__
Out[32]:
'0.0.1'
In [33]:
import scooby
scooby.Report(additional=['holoviews', 'xarray', 'sparse'])
Out[33]:
Mon Jan 04 07:08:49 2021 PST
OS Linux CPU(s) 12 Machine x86_64
Architecture 64bit RAM 125.7 GB Environment Jupyter
Python 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0]
holoviews 1.13.5 xarray 0.16.1 sparse 0.11.2
numpy 1.19.4 scipy 1.5.3 IPython 7.19.0
matplotlib 3.3.2 scooby 0.5.6